University
- North America > United States > California (0.14)
- North America > United States > Florida > Leon County > Tallahassee (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- (2 more...)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- North America > United States > Arizona (0.04)
- (3 more...)
- Information Technology (0.46)
- Materials (0.46)
Supplementary Material for Flat Seeking Bayesian Neural Networks Van-Anh Nguyen 1 Tung-Long Vuong
The proof can be found in Chapter 27 of [6]. For the non-flat version, the update is similar to the mini-batch SGD except that we add small Gaussian noises to the particle models. In Section 4.2 of the main paper, we provide a comprehensive analysis of the performance concerning In the experiments presented in Tables 1 and 2 in the main paper, we train all models for 300 epochs using SGD, with a learning rate of 0.1 and a cosine schedule. For the baseline of the Deep-Ensemble, SGLD, SGVB and SGVB-LRT methods, we reproduce results following the hyper-parameters and processes as our flat versions. ImageNet: This is a large and challenging dataset with 1000 classes.
- Oceania > Australia (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Vietnam (0.04)
- Oceania > Australia (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Asia > India (0.05)
- South America > Brazil (0.04)
- Africa > Ghana (0.04)
- (7 more...)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.67)
- Health & Medicine > Therapeutic Area (0.69)
- Information Technology (0.67)
- Government > Regional Government (0.67)
- Media > Photography (0.48)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- (5 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Middle East > Cyprus (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
MANTRA: a Framework for Multi-stage Adaptive Noise TReAtment During Training
Zhao, Zixiao, Fard, Fatemeh H., Wu, Jie JW
The reliable application of deep learning models to software engineering tasks hinges on high-quality training data. Yet, large-scale repositories inevitably introduce noisy or mislabeled examples that degrade both accuracy and robustness. While Noise Label Learning (NLL) has been extensively studied in other fields, there are a few works that investigate NLL in Software Engineering (SE) and Large Language Models (LLMs) for SE tasks. In this work, we propose MANTRA, a Multi-stage Adaptive Noise TReAtment framework that embeds noise diagnosis and mitigation directly into the fine-tuning process of code-Pretrained Language Models (PTM) and code-LLMs. We first investigate the effect of noise at varying levels on convergence and loss trajectories of the models. Then we apply an adaptive dropout strategy guided by per-sample loss dynamics and Gaussian Mixture Model clustering to exclude persistently noisy points while preserving clean data. Applying to code summarization and commit intent classification, our experiments reveal that some LLMs are more sensitive to noise than others. However, with MANTRA, the performance of all models in both tasks is improved. MANTRA enables researchers and practitioners to reduce the impact of errors introduced by the dataset in training, saves time in data cleaning and processing, while maximizing the effect of fine-tuning.
- North America > United States > Michigan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > British Columbia > Regional District of Central Okanagan > Kelowna (0.04)
- (7 more...)
Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness
Dang, Long, Hapuarachchi, Thushari, Xiong, Kaiqi, Lin, Jing
Adversarial training is an effective method to improve the machine learning (ML) model robustness. Most existing studies typically consider the Rectified linear unit (ReLU) activation function and centralized training environments. In this paper, we study the ML model robustness using ten different activation functions through adversarial training in centralized environments and explore the ML model robustness in federal learning environments. In the centralized environment, we first propose an advanced adversarial training approach to improving the ML model robustness by incorporating model architecture change, soft labeling, simplified data augmentation, and varying learning rates. Then, we conduct extensive experiments on ten well-known activation functions in addition to ReLU to better understand how they impact the ML model robustness. Furthermore, we extend the proposed adversarial training approach to the federal learning environment, where both independent and identically distributed (IID) and non-IID data settings are considered. Our proposed centralized adversarial training approach achieves a natural and robust accuracy of 77.08% and 67.96%, respectively on CIFAR-10 against the fast gradient sign attacks. Experiments on ten activation functions reveal ReLU usually performs best. In the federated learning environment, however, the robust accuracy decreases significantly, especially on non-IID data. To address the significant performance drop in the non-IID data case, we introduce data sharing and achieve the natural and robust accuracy of 70.09% and 54.79%, respectively, surpassing the CalFAT algorithm, when 40% data sharing is used. That is, a proper percentage of data sharing can significantly improve the ML model robustness, which is useful to some real-world applications.
- North America > United States > Florida > Hillsborough County > Tampa (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Overview (0.92)
- Information Technology > Security & Privacy (1.00)
- Education (1.00)